Flat Parallelization

نویسندگان

  • Vitaly Aksenov
  • Petr Kuznetsov
چکیده

There are two intertwined factors that affect performance of concurrent data structures: the ability of processes to access the data in parallel and the cost of synchronization. It has been observed that for a large class of “concurrency-unfriendly” data structures, fine-grained parallelization does not pay off: an implementation based on a single global lock outperforms fine-grained solutions. The flat combining paradigm exploits this by ensuring that a thread holding the global lock sequentially combines requests and then executes the combined requests on behalf of concurrent threads. In this paper, we propose a synchronization technique that unites flat combining and parallel bulk updates borrowed from parallel algorithms designed for the PRAM model. The idea is that the combiner thread assigns waiting threads to perform concurrent requests in parallel. We foresee the technique to help in implementing efficient “concurrency-ambivalent” data structures, which can benefit from both parallelism and serialization, depending on the operational context. To validate the idea, we considered implementations of a priority queue. These data structures exhibit two important features: concurrent remove operations are likely to conflict and thus may benefit from combining, while concurrent insert operations can often be at least partly applied in parallel thus may benefit from parallel batching. We show that the resulting flat parallelization algorithm performs well compared to state-of-the-art priority queue implementations.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Parallelization of Rich Models for Steganalysis of Digital Images using a CUDA-based Approach

There are several different methods to make an efficient strategy for steganalysis of digital images. A very powerful method in this area is rich model consisting of a large number of diverse sub-models in both spatial and transform domain that should be utilized. However, the extraction of a various types of features from an image is so time consuming in some steps, especially for training pha...

متن کامل

Efficient parallelization of the genetic algorithm solution of traveling salesman problem on multi-core and many-core systems

Efficient parallelization of genetic algorithms (GAs) on state-of-the-art multi-threading or many-threading platforms is a challenge due to the difficulty of schedulation of hardware resources regarding the concurrency of threads. In this paper, for resolving the problem, a novel method is proposed, which parallelizes the GA by designing three concurrent kernels, each of which running some depe...

متن کامل

Parallel Markov Chain Monte Carlo via Spectral Clustering

As it has become common to use many computer cores in routine applications, finding good ways to parallelize popular algorithms has become increasingly important. In this paper, we present a parallelization scheme for Markov chain Monte Carlo (MCMC) methods based on spectral clustering of the underlying state space, generalizing earlier work on parallelization of MCMC methods by state space par...

متن کامل

Parallel Implementation of Computational Fluid Dynamics Codes on Emerging Architectures

We consider two emerging parallel computing platforms and their suitability for large-scale computational mechanics codes. CRAY Multi-Threaded Architecture (MTA), featuring custom CPUs and automatic parallelizing compiler suite, is organized around flat uniform access shared memory, high-bandwidth connections between CPUs and memory, lightweight synchronization, and context switching between th...

متن کامل

MPI- and CUDA- implementations of modal finite difference method for P-SV wave propagation modeling

Among different discretization approaches, Finite Difference Method (FDM) is widely used for acoustic and elastic full-wave form modeling. An inevitable deficit of the technique, however, is its sever requirement to computational resources. A promising solution is parallelization, where the problem is broken into several segments, and the calculations are distributed over different processors. ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1705.02851  شماره 

صفحات  -

تاریخ انتشار 2017